Chromosome-level genome of the long-tailed marine-living ornate spiny lobster, Panulirus ornatus

Recent conservation efforts to protect rare and endangered aquatic species have intensified. Nevertheless, the ornate spiny lobster (Panulirus ornatus), which is prevalent in the Indo-Pacific waters, has been largely ignored. In the absence of a detailed genomic reference, the conservation and population genetics of this crustacean are poorly understood. Here, We assembled a comprehensive chromosome-level genome for P. ornatus. This genome—among the most detailed for lobsters—spans 2.65 Gb with a contig N50 of 51.05 Mb, and 99.11% of the sequences with incorporated to 73 chromosomes. The ornate spiny lobster genome comprises 65.67% repeat sequences and 22,752 protein-coding genes with 99.20% of the genes functionally annotated. The assembly of the P. ornatus genome provides valuable insights into comparative crustacean genomics and endangered species conservation, and lays the groundwork for future research on the speciation, ecology, and evolution of the ornate spiny lobster.


Background & Summary
Lobsters, with a prestigious status as valuable marine resources, are highly sought after in global fisheries for their economic and culinary significance.This has placed considerable focus on lobsters within the realms of biology, fisheries, and taxonomy 1 .The marine lobster family presently encompasses 49 acknowledged species, including 11 genera 2 .Lobsters, notable for their large size as benthic invertebrates, have exceptionally long lives, with some species estimated to live over 50 years and possibly up to 100 years 3 .However, the high market demand for lobsters resulted in intensive overfishing.Few countries have implemented effective management strategies to ensure sustainable harvests, and inadequate enforcement of fishing and marketing regulations have, in many regions, put significant strain on lobster populations.Consequently, to safeguard these valuable species and ensure their long-term sustainability, there is an urgent need to explore and implement alternative management approaches, such as co-management 4 .
The ornate spiny lobster, Panulirus ornatus, is an endangered species found on coral reefs and inshore habitats widely distributed in China, the South Pacific, and the Indian Ocean (Fig. 1a,b).In global aquaculture, it ranks as one of the most valued and highly priced fisheries 5 , and is consequently overexploited in unregulated fisheries 6,7 .On February 5, 2021, the ornate spiny lobster (P.ornatus) was classified as a Second Class species on China's National Key Protected Wild Animals List-a notable conservation milestone, making P. ornatus the first crustacean to be recognized and included in this crucial protection list 8 .Like many other valued marine species around the globe, the ornate spiny lobster population faces several critical threats, including marine environmental pollution, injuries from fishing activities, loss of vital habitats, and a decline in fish resources 9 .The combined effects of global climate change and human activities exacerbate these challenges, posing significant risks to the survival and health of lobsters 10 .In conclusion, the population size of P. ornatus is in decline, and the pursuit of further conservation measures for these species is imperative.
Previous attempts to sequence the genome of this species resulted in an incomplete and fragmented assembly, with an estimated genome size of 3.23 Gb compared to the actual assembled genome size of 1.93 Gb and a contig N50 of 5,451 bp, limiting the depth of potential research 11 .Here, we successfully achieved the first chromosome-level genome assembly for an endangered lobster species by integrating a combination of Illumina short reads, PacBio long read DNA sequencing, and Hi-C technology (Fig. 2).The project amassed 182.90 Gb of Illumina short-read data, 115.67 Gb of PacBio continuous long read data, and 456.71Gb of Hi-C data, culminating in an assembled genome size of 2.65 Gb and a scaffold N50 of 51.05 Mb (Tables 1 and 2).Our high-quality genome assembly enhances the genomic resources available for crustaceans and provides essential data for their further protection.

Sample collection and nucleic acid extraction.
We collected male adult P. ornatus from Huangliu Co., LTD. in Sanya, Hainan, China.In this study, muscle tissue samples were collected and meticulously washed three times with sterile phosphate-buffered saline (PBS).The samples were then instantly froze with liquid nitrogen and subsequently stored at −80 °C.Total genomic DNA (gDNA) was extracted for genome survey and construction of the genome sequence libraries using the AMPure bead cleanup kit following the manufacturer's instructions (Beckman Coulter, High Wycombe, UK).Meanwhile, we extracted total RNA from eight tissues (testis, intestines, hepatopancreas, hemocytes, muscle, gills, heart, and eyestalk) of the same individual by utilizing the TRIzol reagent according to the manufacturer's instructions and subjected to RNA-seq analysis for genome structure annotation.The integrity and quality of the extracted nucleic acids were evaluated using 1.5% agarose gel electrophoresis and nucleic acid concentrations were accurately quantified using a Qubit fluorometer (Thermo Fisher Scientific based in Waltham, MA).

Library construction and sequencing.
A short-read library was prepared with an insert size of 350 bp and sequenced utilizing the Illumina Platform to generate 2 × 150 bp reads with NEB Next* Ultra TM DNA Library Prep Kit (NEB, USA) for Illumina short-read sequencing following the manufacturer's recommendations.For PacBio sequencing, we used genomic DNA to construct SMRTbell libraries following the manufacturer's guidelines.We then sequenced the libraries using a PacBio Sequel platform equipped with single molecule real-time (SMRT).These sequencing efforts led to the generation of 182.90 Gb of Illumina short-read data and 292.02Gb of raw continuous long reads (CLR), achieving a comprehensive 179-fold coverage of the P. ornatus genome (Table 1).
For Hi-C library construction, we used the MboI restriction enzyme to digest cross-linked high molecular weight (HMW) gDNA.After 5′ overhang biotinylation and blunt-end ligation, we physically sheared DNA into 300-500 bp fragments.Finally, we sequenced the Hi-C library with a strategy of 2 × 150 bp on the Illumina   HiSeq using the NovaSeq 6000 platform, resulting in 456.71Gb of paired-end raw reads.The sequencing libraries were then constructed using the NEBNext ® UltraTM RNA Library Prep Kit for Illumina ® (NEB, USA), with all procedures strictly adhering to the manufacturer's recommendations.We then sequenced the RNA-seq library using the Illumina HiSeq 6000 platform to generate 2 × 150 bp reads.From this process, we generated 54.38 Gb of paired-end short clean reads, as we detail in Table 1.
Genome survey and assembly.The adapter sequences and low-quality reads obtained from the Illumina platform were removed before the assembly process, using fastp software (version 0.23.1) 12 , retaining only clean reads for the subsequent stages of genome survey and assembly.We conducted genome surveys to determine key genomic characteristics such as overall size, heterozygosity, and repeatability, employing SOAPec (version 2.01) 13 and GenomeScope (version 2.0) 14 software to analyze 17 different K-mer frequencies.From these analyses, with a dominant peak depth of 59, we calculated the estimated genome size of P. ornatus to be 2917.34Mb.We also approximated the heterozygosity and repetitive sequence content of the genome at 0.92% and 63.86%, respectively.In Table S1 and Fig. S1, we comprehensively detail these findings and estimates.
For genome assembly of P. ornatus, we employed a dual approach using two distinct assemblers-Wtdbg2 (version 2.5) 15 and Flye (version 2.9) 16 -each of which produced an initial assembly using default parameters, which we then refined using the Arrow polishing process (version 8.0) 17 .Arrow is a consensus algorithm that generates highly accurate consensus sequences from PacBio subreads.After polishing, we merged the assemblies from Wtdbg2 and Flye using Quickmerge (version 0.3) 18 -a tool specifically designed to combine multiple genome assemblies into a single, unified consensus assembly.The resulting merged assembly was then polished twice using two rounds of Arrow and two rounds of Pilon (version 1.22) 19 with default parameters.We performed PacBio subreads for Arrow and Illumina short reads for pilon, generating a total of 8,061 contigs with a total length of 2,651,872,113 bp (Table 2).
Hi-C scaffolding.In the Hi-C scaffolding phase of this study, we first processed the raw Hi-C reads to eliminate adapters and low-quality bases, using fast software (version 0.23.1) 20with the parameters set to -q 20-l 50.Subsequently, we aligned these processed reads to the preliminary assembly using the Juicer pipeline 21 .Following alignment, we used the 3D-DNA pipeline 22 to perform several critical tasks, including grouping the contigs into chromosomes, and orienting and ordering the contigs within each chromosome.To enhance the accuracy of the assembly, we manually corrected errors using Juicebox Assembly Tools (version 2.13.06) 21.The scaffolding process allowed for accurate anchoring of 2,628.95Mb of the assembly to 73 chromosomes (Fig. 3)-accounting for 99.11% of the total assembly (Table S2).The scaffold N50, a measure of assembly continuity, reached a length of 51.05 Mb in the final assembly (Table 2).This assembly is noteworthy for the contiguity of 14 chromosomes, each with no more than 30 gaps (Table 3).Genomic repeat annotation.We identified repeat sequences in the P. ornatus genome using both homology-based and de novo strategies 23 .Initially, we merged the de novo predicted repetitive sequence database with the Repbase homologous repetitive sequence database 24 .We used a suite of tools-RepeatScout (version 1.0.5) 23, RepeatModeler (version 2.0.1) 25 , Piler (version 1.0) 26 , and LTR-FINDER (version 1.0.6) 27-to identify transposable element (TE) families, whereafter we employed Repeatmasker (version 4.1.0) 25, RepeatProteinMask (version 4.1.0),and TRF (version 4.0.9) 28to classify different repetitive elements.We achieved this classification by aligning the P. ornatus genome sequences with the integrated database.After eliminating redundant results from these three methods, we established that repeat sequences constituted 65.67% of the P. ornatus genome (Table S3).In addition, we calculated the Kimura divergence value of TEs using the script 'calcDivergence-Fromalign.pl 29 and created TE landscapes with 'createRepeatLandscape.pl 30 .Among the identified repeat elements, we identified DNA elements as comprising 4.58% of the genome, with long interspersed nuclear elements (LINEs) accounting for 40.30%.Short interspersed nuclear elements (SINEs) and long terminal repeats (LTRs) constituted only 0.01% and 30.07% of the genome, respectively (Table 4 and Fig. 4).
In the process of annotating noncoding RNA (ncRNA) within the P. ornatus genome, we employed specific tools for different types of ncRNA predictions.For tRNA prediction, we used tRNAScan (version 1.4) 31 , whereas for rRNA prediction we used Blast (version 2.2.26) 32 .To identify other types of ncRNAs, such as miRNA and snRNA, we aligned the sequences to the Rfam database 33 using the INFERNAL tool (version 1.0) 34 .Using these methods, we successfully identified four distinct types of noncoding RNAs in the P. ornatus genome.including 12,771 miRNAs, 5,187 tRNAs, 1,716 rRNAs, and 1,296 snRNAs (Table 5).

Protein-coding gene prediction and annotation.
For gene structure prediction of the P. ornatus genome, we employed a combination of de novo, homology-based, and transcriptome sequencing-based predictions.For the de novo approach, we used a suite of tools-Augustus (v3.2.3) 35 , GlimmerHMM (v3.02) 36 , SNAP (v2013.11.29) 37 , Geneid (v1.4) 38 , and Genscan (v1.0) 39 -to predict gene structures directly from the genome   32 and Genewise (v2.4.1) 40 .With this multifaceted approach, we ensured a thorough and accurate prediction of the protein-coding genes in the P. ornatus genome, thereby enhancing our understanding of its genetic architecture.We identify a total of 5,087-58,220 homolgous genes when comparing against the nine target species (Table 6) (Table 6).We analyzed the lengths of genes, CDS, exons, and introns in P. ornatus and compared them with those of five other species (Fig. 5).We found the average lengths for P. ornatus to be 29,875.91bp for transcripts, 1,420.49bp for CDS, 257.65 bp for exons, and 6,300.84bp for introns (Table S4).Two assembly methods including transcript assembly with reference to the genome and de novo assembly using Trinity software (version 2.11.0) 41 were utilized to process clean RNA-seq data.Open reading frames (ORFs) were identified using PASA (version 2.1.0) 42, and gene sets predicted by the different methods were merged into a comprehensive, non-redundant gene set containing 22,752 protein-coding genes with Evidence Modeler (version 1.1.1) 43(Table 7 and Fig. 6a).

Data Records
We deposited the genomic Illumina sequencing data in the SRA at NCBI SRR26801482 52 and SRR26801483 53 .
We deposited the Hi-C sequencing data in the SRA at NCBI SRR26801479-SRR 26801481 [64][65][66] .This Whole Genome Shotgun project has been deposited at GenBank under the accession https://identifiers.org/ncbi/insdc.gca:GCA_036320965.1 67 .The version described in this paper is version ASM3632096v1.The final chromosome assembly and genome annotation files are also available in Figshare 68 .

technical Validation
Evaluation of genome assembly and annotation.We rigorously evaluated the quality of P. ornatus genome assembly using multiple methods.First, with the Benchmarking Universal Single-Copy Orthologs (BUSCO) (version 3.0.2) 69 assessment, using the BUSCO database (arthropoda_odb9) of single-copy orthologous genes along with tools such as tblastn, augustus, and hmmer, we confirmed the presence of 93.6% of gene orthologs in P. ornatus, with 93.6% being complete and 3.2% fragmented, indicating a comprehensive assembly (Table S5).Second, employing the Core Eukaryotic Genes Mapping Approach (CEGMA) (version 2.5) 70 , we revealed that P. ornatus genes had homologs for 226 highly conserved core genes, accounting for 91.13% (248) of the total, further confirming the completeness of the assembly (Table S6).Finally, we aligned Illumina sequencing reads to the nuclear genome using BWA (version 0.7.8) 71 , resulting in a high read mapping rate of 97.85% and a coverage rate of 96.80%, demonstrating the better integrity of the assembled genome as well as the homogeneity of the sequencing data (Table S7).These collective findings indicate the high quality of P. ornatus genome assembly.
Collinearity analysis.For whole genome synteny comparison, we aligned the chromosome-level genomes of two decapod species, Penaeus chinensis and Procambarus clarkii, with the P. ornatus genome assembly, using LASTZ (version 1.02.00) 72with default parameters.We found that nearly 73 chromosome-level scaffolds of P. ornatus exhibited significant similarity with the corresponding chromosomes of P. chinensis and P. clarkii (Fig. 7).This similarity underscores the high quality of the sequencing and assembly of the P. ornatus genome, while improving the reliability of phylogenetic analyses.
In conclusion, we successfully assembled a high-quality chromosome-level genome of P. ornatus.This newly generated reference genome represents a significant contribution to our knowledge of lobster genetic diversity.It will not only advance comparative evolutionary studies but also play a crucial role in conservation efforts for this endangered species.

Fig. 1
Fig. 1 Photograph and location distribution of the long-tailed marine-living ornate spiny lobster, Panulirus ornatus.(a) A photo of the adult P. ornatus.(b) A natural distribution map of P. ornatus (red star).

Fig. 3
Fig. 3 Hi-C heatmap (200-kb resolution) showcasing the interaction frequencies between different chromosomes of the ornate spiny lobster.

Table 6 .Fig. 5
Fig. 5 Comparisons of genomic elements of closely related species.

Fig. 6
Fig. 6 Gene prediction and functional annotation of the P. ornatus genome.(a) Venn diagram of the gene set prediction.(b) Venn diagram of functional annotation based on different databases.

Fig. 7
Fig. 7 Chromosome sequence synteny comparisons.(a) Syntenic relationship between the P. ornatus genome and the P. chinensis genome.(b) Syntenic relationship between the P. ornatus genome and the P. clarkii genome.Each line connects a pair of homologous sequences between the two species.

Table 1 .
Statistics of the sequencing data.

Table 2 .
Assembly statistics of the ornate spiny lobster.

Table 3 .
Assembly statistics for the chromosomes.

Table 4 .
Classification of repetitive sequences in the P. ornatus genome.
Fig. 4 Distribution of divergence rates for TEs in the P. ornatus genome.

Table 5 .
Classification of ncRNAs in the P. ornatus genome.

Table 7 .
Statistical analysis of functional gene annotations of the P. ornatus genome.